New Environments Set the Stage for Changing Tastes in Mates

نویسندگان

  • Anshul Kundaje
  • Manuel Middendorf
  • Mihir Shah
  • Chris H Wiggins
  • Yoav Freund
  • Christina Leslie
چکیده

Background: We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is upor down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem — predicting +1 and -1 labels corresponding to upand down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. Methods: In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more shallow and interpretable tree. We also show how to incorporate genome-wide protein-DNA binding data from ChIP chip experiments into the GeneClass algorithm, and we use an improved noise model for gene expression data. Results: Using the improved scalability of Robust GeneClass, we present larger scale experiments on a yeast environmental stress dataset, training and testing on all genes and using a comprehensive set of potential regulators. We demonstrate the improved stability of the features in the learned prediction tree, and we show the utility of the post-processing framework by analyzing two groups of genes in yeast — the protein chaperones and a set of putative targets of the Nrg1 and Nrg2 transcription factors — and suggesting novel hypotheses about their transcriptional and post-transcriptional regulation. Detailed results and Robust GeneClass source code is available for download from http://www.cs.columbia.edu/compbio/robust-geneclass. from NIPS workshop on New Problems and Methods in Computational Biology Whistler, Canada. 18 December 2004 Published: 20 March 2006 BMC Bioinformatics 2006, 7(Suppl 1):S5 doi:10.1186/1471-2105-7-S1-S5 NIPS workshop on New Problems and Methods in Computational Biology Gal Chechik, Christina Leslie, Gunnar Rätsch, Koji Tsuda Proceedings 1471-2105-7-S1-info.doc http://www.biomedcentral.com/content/pdf/14712105-7-S1-info.pdf

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking Efficient Decision Making Units in Data Envelopment Analysis based on Changing Reference Set

One of the drawbacks of Data Envelopment Analysis (DEA) is the problem of lack of discrimination among efficient Decision Making Units (DMUs). A method for removing this difficulty is called changing reference set proposed by Jahanshahloo and et.al (2007). The method has some drawbacks. In this paper a modified method and new method to overcome this problems are suggested. The main advantage of...

متن کامل

The Time Adaptive Self Organizing Map for Distribution Estimation

The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...

متن کامل

A New Method for Ranking Extreme Efficient DMUs Based on Changing the Reference Set with Using L2 - Norm

The purpose of this study is to utilize a new method for ranking extreme efficient decision making units (DMUs) based upon the omission of these efficient DMUs from reference set of inefficient and non-extreme efficient DMUs in data envelopment analysis (DEA) models with constant and variable returns to scale. In this method, an L2- norm is used and it is believed that it doesn't have any e...

متن کامل

A context-sensitive dynamic role-based access control model for pervasive computing environments

Resources and services are accessible in pervasive computing environments from anywhere and at any time. Also, due to ever-changing nature of such environments, the identity of users is unknown. However, users must be able to access the required resources based on their contexts. These and other similar complexities necessitate dynamic and context-aware access control models for such environmen...

متن کامل

A New Algorithm for the Deinterleaving of Radar Pulses

This paper presents a new algorithm for the deinterleaving of radar signals, based on the direction of arrival (DOA), carrier frequency (RF), and time of arrival (TOA). The algorithm is applied to classic (constant), jitter, staggered, and dwell switch pulse repetition interval (PRI) signals. This algorithm consists of two stages. In the first stage, a Kohonen neural network clusters the receiv...

متن کامل

Presenting a New Model for Bank’s Supply Chain Performance Evaluating with DEA Solution Approach

Data Envelopment Analysis (DEA) is a method for measuring the efficiency of peer decision making units (DMUs) with multiple inputs and outputs. The traditional DEA treats decision making units under evaluation as black boxes and calculates their efficiencies with first inputs and last outputs. This carries the notion of missing some intermediate measures in the process of changing the inputs to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PLoS Biology

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2005